A working guide to boosted regression trees.

نویسندگان

  • J Elith
  • J R Leathwick
  • T Hastie
چکیده

1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Title: a Working Guide to Boosted Regression Trees

Comment: This is the final submitted manuscript for this paper, without further corrections. It has been reformatted for efficient printing. For a pdf of the final Blackwell publishing version please email Jane Elith SUMMARY 1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data such as non...

متن کامل

Boosted trees for ecological modeling and prediction.

Accurate prediction and explanation are fundamental objectives of statistical analysis, yet they seldom coincide. Boosted trees are a statistical learning method that attains both of these objectives for regression and classification analyses. They can deal with many types of response variables (numeric, categorical, and censored), loss functions (Gaussian, binomial, Poisson, and robust), and p...

متن کامل

An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics

We present results from a large-scale empirical comparison between ten learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We evaluate the methods on binary classification problems using nine performance criteria: accuracy, squared error, cross-entropy, ROC Area, F-score, p...

متن کامل

Boosted regression trees with errors in variables.

In this article, we consider nonparametric regression when covariates are measured with error. Estimation is performed using boosted regression trees, with the sum of the trees forming the estimate of the conditional expectation of the response. Both binary and continuous response regression are investigated. An approach to fitting regression trees when covariates are measured with error is des...

متن کامل

Optimization with Gradient-Boosted Trees and Risk Control

Decision trees effectively represent the sparse, high dimensional and noisy nature of chemical data from experiments. Having learned a function from this data, we may want to thereafter optimize the function, e.g., picking the best chemical process catalyst. In this way, we may repurpose legacy predictive models. This work studies a large-scale, industrially-relevant mixed-integer quadratic opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • The Journal of animal ecology

دوره 77 4  شماره 

صفحات  -

تاریخ انتشار 2008